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the Java-Indonesian dictionary search application in terms of accuracy and 
processing time. Performance Testing is used to test the performance of 
algorithm implementations in applications. The test results show that 


the Boyer Moore and Knuth Morris Pratt algorithms have an accuracy rate of 

100%, and the Horspool algorithm 85.3%. While the processing time, Knuth 
Keywords: Morris Pratt algorithm has the highest average speed level of 25ms, 
Horspool 39.9 ms, while the average speed of the Boyer Moore algorithm is 
44.2 ms. While the complexity test results, the Boyer Moore algorithm has 
an overall number of n 26n’, Knuth Morris Pratt and Horspool 20n? each. 
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1. INTRODUCTION 

Search algorithm is one of the fundamental research studies in computer science [1-8], including its 
use in dictionaries. A dictionary is a tool used by someone to learn languages, both international, national, and 
regional languages. The process of searching vocabulary in a dictionary application requires time in the search 
process to issue a translation of the word being searched. The search process generally uses a string matching 
algorithm as a data search algorithm [9-13]. The purpose of using a string matching algorithm, also, to speed 
up the search process also aims to obtain the accuracy of search results. Several algorithms are belonging to 
this string algorithm, including the Boyer-Moore algorithm, Horspool algorithm, Knuth Morris Pratt algorithm, 
and others [14-16]. 

Research that explains the implementation and updating of matching strings has been discussed in 
previous studies [1, 4, 7, 12, 15, 17, 18]. Based on research that has been done before, this research tries to 
develop a Java-Indonesian language di ctionary application, by comparing the performance of several Boyer 
Moore string data search algorithms. Knuth Morris Pratt and the Horspool algorithm with the addition of 
the Speech To Text feature to the application. These three algorithms are the best string matching algorithm, 
which has different table shifts which can search data faster than other algorithms [3-5, 19, 20]. Performance 
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that is compared is the value of accuracy and average processing time of search results, so we know which 
string matching algorithm is the best to develop in future research. 


2. RESEARCH METHOD 

The performance testing method is used to test the performance of three string matching algorithms, 
namely the Boyer-Moore algorithm, the Knuth Morris Pratt algorithm, and the Horspool algorithm in 
the Indonesian-Javanese dictionary application. Performance is measured based on the level of accuracy and 
the level of speed in the search process time in the application. Performance analysis is carried out in several 
stages. The first stage is the input stage of the vocabulary or data string, the second stage is the process of 
searching for string data using one of the string algorithms. Figure 1 explains the stages of algorithm 
performance analysis. The stage is called pre-processing which is a phase of text mining which is very 
important [21, 22], pre-processing represents data to be more structured until the data is ready to be 
processed [23] according to the algorithm used. The last stage is the output stage of the calculation process of 
the algorithm used, in the form of a translation of the vocabulary or search string data. 


7 7 Preprocessing 
Boyer More d KEMP *| Pattern and Text 


fHorspoal Matching 


Input Data String 


Translation Translation 
Process Pattern Fits 





Figure 1. Sistem flowchart running 


2.1. Boyer-Moore algorithm 

The Boyer-Moore algorithm becomes one of the most frequently used string-lookup algorithms or is 
implemented into a document or data search feature in the database because it is considered the most efficient 
in typical applications and is best compared to other string search algorithms [24, 25]. The Boyer-Moore 
algorithm starts matching a character from the right direction of the pattern or the right-to-left direction of 
the text [16, 26, 27]. Adequately systematic, the stages that the Boyer-Moore algorithm performed at the time 
of matching the following strings [26, 28]: 
a. Boyer-Moore's algorithm starts matching the pattern at the beginning of the text. 
b. The Boyer-Moore algorithm will match from right-to-left to match the pattern character characters with 

the characters in the matched text until one of the conditions is met. 

The searching of data in Boyer Moore algorithm can be seen on the pattern to search the word “ALA” on 
“BURUK ALA”. 
a. Align the pattern of ALA, matched with BURUK ALA 
Text :BURUKALA 
Pattern :ALA 
b. Determine the shift table BmBc and BmGs 

Tables 1 and 2 explain the Boyer Moore algorithm in searching data that can be seen from the search 
for the word you want to search (patterns). The BmBc value in Table 1 is obtained from the results of 
enumeration starting from the string and then to the initial string, starting at the Oth index. Then record 
the characters that have been found [26, 28, 29]. Table 2 explains the process of finding BmBC values. 
The enumerator value will be added by 1 found this character has never been found before. Then back to 
the previous position, the character "A" because the character "A" has been seen previously, the value of 
the transfer is 1 [26, 28, 29]. 
c. Make the iteration table to the pattern matching with text 

Table 3 explains the iteration pattern of text matching in the Boyer-Moore algorithm. Iteration in 
the Boyer-Moor algorithm stops at the 4th iteration, meaning that the search for the word ALA in BURUK 
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ALA text is found in the 4th iteration. Iteration is carried out based on the value of the shifts in the BmBc and 
BmGs tables and compares which shift value is highest between the two [26, 28, 29]. 


Table 1. BmBc value Table 2. BmGs value 
Table BmBc Table BmGs 
Index 0 l 2 Index 0 l 2 
Pattern A L A Pattern A L A 
BmBc 0 l 0 BmBc 3 3 l 


Table 3. Boyer Moore algorithm alteration scheme 
5 7 


Indeks 0 l 2 3 4 6 8 
1 B U R U K A L A 
A L A 
2 B U R U K A L A 
A L A 
3 B U R U K A L A 
A L A 

4 B U R U K A L A 
Data Found A L A 


2.2. Knuth Morris Pratt algorithm (KMP) 

The KMP algorithm has a different shift than the Boyer-Moore algorithm. Broadly speaking the stages 
in the KMP algorithm, when performing string matching are [16, 20, 22, 30]: 
a. Algorithms KMP starts matching pattern at the beginning of the text 
b. The KMP algorithm performs a shift or matching pattern character with text characters from left to right 

by matching characters per character until one of the conditions is met. 

The searching of data in Knuth Morris Pratt algorithm can be seen on the pattern to search the word 
“ALA” on “BURUK ALA”. 
a. Align the pattern of ALA, matched with BURUK ALA 
Text :BURUKALA 
Patten :ALA 
b. Determine the value of data fringe 

In Table 4, the initial boundary value will always be 0. The boundary value is calculated based on 
a character pattern only. If anyone appreciates the character in the pattern, then the edge value 1 and so on will 
increase in value with the value j, and I shift to the next pattern character [16, 20, 22, 30]. 


Table 4. Pattern edge 


Table Pattern edge 
J 0 l 2 
PO) A L A 
B(b) 0 0 1 


c. Make the iteration table to the pattern matching with text 

Table 5 explains the iteration pattern matching text on the KMP algorithm. The iteration of the KMP 
algorithm is similar to the Boyer Moore algorithm; the iteration stops at the 4th iteration. In contrast to 
the KMP Algorithm, the interaction process is not based on BmBc and BmGs tables but based on values from 
the edges of the pattern. 


Table 5. Knuth Morris pratt algorithm alteration scheme 


Indeks 0 l 2 3 4 5 6 7 8 
1 B U R U K A L A 
A L A 
2 B U R U K A L A 
A L A 
3 B U R U K A L A 
A L A 

4 B U R U K A L A 
Data Found A L A 
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2.3. Horspool algorithm 

The Horspool algorithm is one of the string search algorithms, which is a simplification of 
the Boyer-Moore algorithm [15, 16, 31, 32]. The Horspool algorithm has a simpler shifting compared to 
the Boyer-Moore algorithm. The Boyer-Moore algorithm has two shifting process functions, 1.e., bad-character 
shift and good-suffix shift, so the Horspool algorithm only uses one panning that is bad-character shifting [33]. 
The function preprocess pattern in the Horspool algorithm is by performing a jump based on 
"pad-character" or based on the character mismatch in the pattern found in the text. Panning the Horspool 
algorithm uses the rightmost character in the current text window to determine the shift distance to be 
performed. The pattern will shift to the far right of the window until a match between the pattern character and 
the text. The searching of data in Horspool algorithm can be seen on the pattern to search the word “ALA” on 
“BURUK ALA”. 
a. Align the pattern of ALA, matched with BURUK ALA 
Text :BURUKALA 
Patten :ALA 
b. Determine the shift table BmBc 

Table 6 explains the initial interpretation of the Horspool algorithm. The matching process starts at 
the Oth index or the character "A". Perform the previous position, and the enumerator value will be added 1 if 
this character has never been found before, Step back to the previous position which is the character "A" 
because the character "A" has been found before then the replacement value is 1. The final scheme of 
the Horspool algorithm matching process is examined in Table 7. The last iteration determines the character 
0 (blank) in the text does not match the character A in the pattern, so the matching process removes because 
0 (empty) does not match. 


Table 6. BmBc value 


Table BmBc 
Index 0 l 2 
Pattern A L A 
BmBs 0 l 0 


Table 7. Horspool algorithm literacy schema 


Indeks 0 l 2 3 4 5 6 7 8 
1 B U R U K A L A 
A L A 
2 B U R U K A L A 
A L A 
3 B U R U K ? A L A 
Data Not Found A L A 


3. RESULTS AND ANALYSIS 

Data used in the form of a vocabulary that will be translated into the Javanese language derived from 
the existing Javanese dictionary. In the research that is being done this observation is done in the form of direct 
observation of the use of Javanese language in the community that began to be shifted based on a journal or 
article that writes directly about the alignment The use of Javanese language among the younger generation 
and several journals that proves some matching string algorithms that can be used for comparison. 

Figure 2 explains the test results of the three algorithms. Based on the results of testing of 
1500 vocabularies with 400 experiments, the accuracy of the Horspool algorithm is lower than the KMP and 
Boyer-Moor algorithms, with an accuracy rate of 85.3%, while the KMP and Boyer-Moor algorithms are 100% 
respectively. Mathematically the test results are explained as follows: 


Accuracy level = (Number of successful samples/total number of samples) x 100% 
400 
Accuracy of Boyer-Moore = 5) x100%= 100% 


Accuracy of Akurasi KMP = (5) x100% = 100% 


Accuracy of Akurasi Horspool = (=) x100% = 85.3% 
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Figure 3 explains the results of testing the time of the third algorithm search process. The speed of 
the algorithm is tested based on the processing time in searching vocabulary. Based on the test results, 
the algorithm that has the fastest speed is Knuth-Morris-Pratt with an average of 25 ms, Horspool 39.9 ms, and 
Boyer Moore 44.2 ms. Figure 4 explains in detail the comparison graph of the speed of each algorithm 
in searching each Javanese vocabulary into Indonesian where blue indicates Boyer Moore's algorithm, red 
indicates Knuth Morris Pratt's algorithm, and green indicates Horspool's algorithm. 
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Figure 2. Accuracy algorithm 
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Figure 3. Average value of comparison algorithm at match speed level 
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Figure 4. Time execution algorithm 


Testing algorithms based on the complexity of the algorithm, testing is conducted based on 
the efficiency of how much space (space) and the time it takes for the algorithm to run every step by step 
in the algorithm. After testing the three algorithms, obtained the result that Boyer Moore's algorithm has 
an overall n total of 11n, with the amount of n? as much as 26n’. The algorithm of Knuth Morris Pratt algorithm 
has an overall n of 8n and an amount of n? as much as 20n’, and the Horspool algorithm has a value of n 
as much as 10n and value of n? as much as 20n?. Figure 5 explains that Knuth Morris Pratt's algorithm has 
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the best time efficiency with a sum value of N? and the least n values compared to the Horspool algorithm and 
Boyer Moore's algorithm, which means the time the algorithm process is Has the most rapid processing 
efficiency rates compared to another algorithm. 
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Figure 5. Comparison of algorithm complexity testing 


CONCLUSION 
Based on the test results, Boyer-Moor and KMP have a higher level of accuracy compared to 


the Horspool algorithm. While the average processing time of KMP is better than Horspool and Boyer-Moor. 
The processing time is directly proportional to the efficiency produced by KMP, which is better than 
the Boyer-Moor and Horspool algorithm, with the least number of n? and n values. 
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